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Abstract — Designing resource allocation strategies for power 
constrained sensor network in the presence of correlated data 
often gives rise to intractable problem formulations. In such sit- 
uations, applying well-known strategies derived from conditional- 
independence assumption may turn out to be fairly suboptimal. In 
this paper, we address this issue by proposing an adjacency-based 
spatial whitening scheme, where each sensor exchanges its obser- 
vation with their neighbors prior to encoding their own private in- 
formation and transmitting it to the fusion center. We comment on 
the computational limitations for obtaining the optimal whitening 
transformation, and propose an iterative optimization scheme to 
achieve the same for large networks. We demonstrate the efficacy 
of the whitening framework by considering the example of bit- 
allocation for distributed estimation. 

I. Introduction 

Wireless sensor networks consist of spatially distributed noisy 
sensors that cooperatively monitor environmental conditions. 
Since the individual sensor nodes are characterized by limited 
energy, bandwidth and computational capability, the task of 
the fusion center (EC) is to make accurate inference about the 
phenomenon by requesting as little information from the sensor 
nodes as possible [T|. Depending on the particular application 
and set of constraints, the EC often has to adopt smart strategies 
to collect and process data 12J. While the design of optimum 
strategies in some cases is relatively easy under the assumption 
of conditional independenc^ across sensors, it is well known 
that the design gets harder and sometimes the optimum strategy 
is intractable when correlation has to be taken into account 13|. 
In particular, when the sensors are geographically close, they 
are expected to possess significant correlation among themselves 
and the optimum strategies derived for the independent case will 
no longer be optimal. In this paper, we introduce a framework 
called spatial whitening (to be formalized later) to deal with this 
problem. 

Our framework stems from this idea: If two sensors in a 
network are highly correlated, they are also likely to be spatially 
close, which means that they should be able to communicate 
and exchange information among themselves in a relatively in- 
expensive manner (avoiding routing overheads and long distance 

'Here, 'independence' refers to the statistical independence of sensor data 
conditioned on tlie parameter of interest. For additive Gaussian observation 
noise, this is equivalent to the covariance matrix of noise being diagonal. The 
observations are still marginally dependent, since they are observing the same 
parameter. 
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communications). Each sensor in the network can now use the 
information from neighboring nodes to achieve a local whitening 
transformation. If each of such local transformations can be 
coordinated, one can aim to achieve global whitening, and the 
transformed observations can then be transmitted to the EC using 
optimum encoding strategies (for inference, resource allocation, 
etc) that were derived for conditionally independent scenarios. 
Hence, this two-stage (whitening followed by encoding) frame- 
work potentially enables the use of several earlier known results 
in the presence of correlated noise. 

We introduce the log-determinant divergence based formula- 
tion of spatial whitening in Section |ll] To illustrate the potential 
usage of this framework, we employ the problem of distributed 
parameter estimation where several sensor nodes quantize 
their individual observations before sending them to EC. The 
goal is to minimize the expected distortion of the estimated 
parameter subject to a constraint on the total number of bits 
transmitted to the EC. We demonstrate that an optimal strategy 
for bit allocation (derived for independent scenario f3l) deliv- 
ers increasingly better performance with increasing degree of 
whitening. 

The whitening transformation described in this paper re- 
quires local message passing which is certainly not without 
cost. However, in this paper, we assign no cost to whitening, 
acknowledging fully that any actual implementation of a system 
would have to consider the tradeoff between the benefits of 
whitening and the cost of it. Investigations on this tradeoff is a 
worthy topic for future research. 

The concept of whitening, in general, has mostly been ad- 
dressed in a global framework till now. It is well known that 
the Karhunen-Loeve Transform (KLT) [4J (also referred to as 
Principal Components Analysis, PCA) of a random vector with 
covariance matrix S = UAU'^ provides the unique whitening 
transformation (U ) that is also orthogonal. However, PCA is 
ill suited for our problem, since those whitening transformations 
are not local, while the orthogonality property serves no addi- 
tional purpose. The Cholesky decomposition S LL^ , which 
provides the unique lower triangular whitening transformation 
(i"^), also requires non-local transformations. Moreover, the 
lower-triangular property imposes a tree-type dependence struc- 
ture while in fact there is no natural ordering of spatially 
correlated data ISJ. Other sparsity-inducing decompositions like 
Sparse-PCA lH and vector Sparse-PCA Q are explorator^in 
nature, which means that the resulting transformations are not 
guaranteed to be local. In |i8J, a hardware-friendly technique was 

-The placeholders for non-zero coefficients are not known/specified before- 
hand. 



proposed to achieve generic spatial whitening transformations 
that were also global in scope. In distributed-KLT f^, individual 
nodes observe non-overlapping portions of a random vector and 
perform dimensionality-reduction (without collaboration with 
neighbors) for optimum reconstruction at the FC. 

Local communication among sensors has mostly been used to 
address in-network inference problems till now. Distributed con- 
sensus problems |9| aim at designing iterative message passing 
schemes in order to compute (some linearly weighted) average 
at all the nodes. Graphical model based problems involve the 
selection of structured inverse-covariance matrices (example 2.5 
in ifTOl ) and the subsequent design of message passing schemes 
for (posterior) belief computation fTTl at all nodes. In this paper, 
we address the fixed FC based problem - where the inference 
is performed at the FC rather than inside the network. 

Our primary contribution in this paper is the formulation of 
a whitening framework that harnesses (some minimal) local 
communication among sensors for efficient resource allocation 
in fixed FC applications. 

II. Problem Statement 

We consider TV sensors in a network that is observing an 
unknown, deterministic, scalar parameter of interest 9 in the 
presence of zero-mean, correlated Gaussian noise with covari- 
ance S. Hence the sensor observations x — [xi,X2, ■ ■ ■ ^xn] 
follow 
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Note that the sensor observations x^-s are conditionally- 
independent when S is a diagonal matrix. Let the neighborhood 
structure among the various nodes be represented by the N x N 
adjacency matrix A, Aij e {0,1}, which is expected to be 
sparsely populated. Entries Aij — 1 signify that node i is a 
neighbor of node j. A low-cost link for local communication is 
assumed to be available between two neighboring links. Since 
each node is trivially connected to itself. An — 1. We denote 
the set of all A-sparse matrices as 



Sa^{W 



t,NxN 



if A, 



0}. 



(2) 



Note that because A is the adjacency matrix, all linear transfor- 
mations of the form 



X = wx ^ jV(wie,wi:w^), weSa, 



(3) 



can be realized relatively inexpensively through local data 
transmissions, i.e., node k realizes the transformation Xk = 
J2jej\/^. ^kjXj by collaborating with its set of neighbors Afk — 
{ii; J2j • • • ij\Afk\}^ all the columns-indices of A such that 

The goal is to find the optimal mean-preserving, whitening 
transformation, i.e., one for which Wl — 1, and WSW^^ is as 
near to some diagonal matrix as possible. The mean-preserving 
condition ensures that the problem framework is preserved, 
i.e., any resource allocation algorithm previously designed for 
the observation domain x is applicable to new transformed 
domain x. The whitening condition helps induce conditional 
independence across sensors (in some optimal sense). We chose 
the log-determinant divergence lfT2l as our metric for matrix- 
nearness, a point that we will elaborate later. The idea is that 
the nodes can use (optimally) whitened observations Xk (instead 
of original correlated observations x^.) as the information to be 
encoded and relayed to the FC. This way an encoding strategy 
that was derived using conditional independence assumption 



across sensors can be used to enhance the performance of the 
system. We will consider the application of optimal encoding 



for distributed estimation in Section IV and show the resulting 



improvement in performance due to the two-stage processing. 
But before that we describe our approach towards finding 
the optimum whitening transformation and comment on the 
computational aspects. 

In the domain of symmetric positive-definite N x N matrices, 
the log-determinant divergence of P from Q is defined 1 12] as 

= Tr Q^^P - logdet P - + log det Q. (4) 

It is well known that C{P; Q) is a Bregman-divergence lfT2ll 
and hence convex in P for any fixed Q. Also £(P; Q) > 
for all P and Q with equality if and only if P — Q. We 
formulate the spatial whitening problem as finding an A-sparse, 
mean-preserving transformation W and a diagonal matrix (with 
positive entries) D such that the divergence IS 
minimized, 

min C{W11W'^\D) s.t. W (^Sa.WI^I. (5) 



We note from definition (|4]) that 

tT 



C{WT,W' ]D):^ C{D--^WT,W' D-^-I), (6) 

where I is the identity matrix. Using (|6|, we obtain an equivalent 
formulation of (|5]), 

(7) 



nun /:(ZSZ ;J) s.t. Z e Sa, 

where W ^ 5-^{Z\)Z, D^d-^{Z1), (8) 

where S{-) is the diagonalizatiorj^ operator We note that (|7]l is 
a significantly simplified re-formulation of (|5]). Using Q, we 
define the cost function w.rt. Z as 

1{Z) =C{Zi:Z^;I) =Tr ZSZ^ - logdet ZZ^ + cq, (9) 

where cq == —A^ — log dot S is a constant. We refer to (|7]) as the 
log-determinant divergence based spatial whitening problem. If 
the cardinality of non-zero elements of A is Viz{A) < N'^, then 
(|7]i is an optimization problem in M"^'"^'. 

Since Z is not restricted to the set of symmetric positive- 
definite matrices (denoted by 5'"''^), our objective function (|9]) 
does not inherit the convexity property of well known max-det 
problems |10|. Neither does the first-order gradient condition, 
written in matrix-derivative notations |T3l, 

Al{Z) 



dZ 



2(ZS 



)oA = 0, 



(10) 



where o denotes the element-wise (or Hadamard) product, 
lend itself to any known closed-form solution except in the 
trivial situation when A is the all-1 matrix (in which case, 
ZSZ^ = J, and any orthogonal multiple of the Cholesky 
factor is a solution for Z). In the next section, we provide 
an iterative algorithm that finds (locally) optimal solutions to 
problem (|7]l. Multiple runs using ^ootrl starting points must 
be used to mitigate the local-maxima problem and obtain a 
satisfactory solution. It may be noted here that in most of 
existing literature, matrix factorization problems of this nature 
(involving sparsity/structure) are inherently non-convex and can 

^Function X = 5{x) is defined as 5 : K^x^ such that x 

corresponds to the diagonal elements of X, other elements being zero. 

'^1{Z) is convex in the smaller subset Sa D ^'^^ ^ the minima within which 
can be efficiently computed and considered a good starting point. 



only guarantee locally optimal solutions H, ||6l, Q. 

III. Iterative Algorithm for Spatial Whitening 

In our iterative approach to solving problem Q, we update 
each row of elements in Z to achieve the optimum decrement 
in divergence, while keeping the rest of the matrix unchanged. 
This process is repeated until convergence. Each such iteration 
is a convex optimization problem and we obtain closed form 
expressions for the updates. Some of the details in this section 
is skipped for the sake of brevity and relegated to [14]. 

Optimizing (|7]i with respect to the row-vector = Zk,N'k ^ 
]g^|A^il while keeping all the other elements of Z constant is 
equivalent ifHI to minimizing 

1 



2 



k^k — 
>|7Vfc|x|AAfc| 



\og{z^Ck) 



(11) 



where S^, denotes the A/fe -clique covariance matrix extracted 
from S, and the elements of are defined by 

(cfc), = (-l)''+-''det(Z_fc,jJ, i = l,2,...,|A4|, (12) 

with Z^k j. denoting the matrix obtained after truncating the 
fc* row and column of Z. The first-order gradient condition 
of ( |TT] i implies {z^CkjT^kZk = c^, solving which one obtains 
the unique extremum of ([TTJ, 



Zi. = 



(13) 



That is the minimizer follows from the convexity of ([TTJ (the 
Hessian is (S/^ + {zJ^Ck)^'^CkC^), which is positive definite). 

Each rank-one update of the form ([TSj can be efficiently 
computed using the well-known Woodbury-formula, details of 
which are relegated to |14|. Since the overall divergence of (|7]) 
decreases at each of the iterations of ( [TT| ), and the minimum 
divergence is lower bounded (see equation ( fTO] )) by 

sup 1{Z) > sup /(Z) = 1{L-^) = 0, (14) 

this iterative algorithm is guaranteed to converge. It may be 
noted that these kind of iterative techniques are sometimes called 
block-coordinate-descent or terminal-by-terminal optimization 

a. 

In the remainder of this paper, we will focus on the application 
of spatial whitening to distributed estimation. 

IV. Example: Bit- Allocation for distributed 

ESTIMATION 

We consider the practical parameter-estimation problem 
where individual sensors in a network are required to quantize 
their real-valued local measurements to an appropriate length 
and send the resulting discrete message to the FC, while the 
latter combines all the received messages to produce a final 
estimate Q. The critical resource that needs to be conserved is 
the bandwidth or equivalently, the rate of transmission. Assume 
that the network consisting of N nodes is allowed to transmit 
only B bits in totality for a one-shot estimation problem. The 
question then is how to judiciously allocate the B bits among 
the various sensors such the the resulting distortion of estimate 
is minimized at the FC O, ifTSl . For the sake of simplicity, 
we assume that each sensor incurs an equal per-bit cost for 
transmission. 



We would use the quantization and bit allocation framework 
outlined in |3|. All observations x^-s are assumed to be bounded 
to a finite interval [—U,U] and a uniform probabilistic quanti- 
zation is performed. An observation is quantized with 6fc-bits as 



(fe) 



follows. The quantization points a 

(k) 

are uniformly spaced such that aj^-^ 



a 



-[/,C/],j = l,...,2^ 
(fe) 



A 



Afe. Suppose that Xk 

(fc) (fe) 



e a 



,+1 - 2C//(2'"^- - 1) - 

'jj^i)- Then is quantized to 



either afj_^ or aT' according to 



P{mk 



a 



q, P{mk 



l-q, (15) 



where rrifc is the resulting message and q — (a^_/i — Xfe)/Afe. 

When the noise is Gaussian and independent across sensors, 
the subsequent near-optimal strategy [3 | is particularly simple 
and allocates 



h = ROUND 



log2 1 



1 



(16) 



bits to the fc* sensor, where cr^ is the individual variance, 
A > controls the overall sum of bits J2^=i^k — B and 
the rounding is performed to the nearest integer. The idea is 
that FC broadcasts a lower value of A when a more precise 
parameter estimate is needed. However, when the noise is 
correlated, strategy ( [T6| l is suboptimal and this is where spatial 
whitening can be of help. Once we perform a spatial whitening 
transformation in the observation space, the idea is that we 
effectively de-correlate the noise without losing any information 
and hence a strategy like ([T6| applied on the modified space can 
still deliver near-optimal performance. 

Next we state the distortion metric derived in Q which we 
shall use for comparing the performance of various schemes. 
For a random variable y ^ JV{10, C) that is effectively range 
limited in [—U, U\, the mean-square-error (MSE) for estimating 
9 at FC (when yk is quantized to using hk bits) following 
the scheme in ( [T5| l, is given by 



MSE(6i) 



(17) 



where Q is the diagonal matrix with elements Qkk = 
(C/^)/(2'"= — 1)^. It is assumed that FC is using the optimally 
weighted fusion rule 9 ^ (l^C"^l)-il^C"^m (see |16|) on 
the quantized observations. 

Our simulation setup is as follows. The spatial placement 
and neighborhood structure is modeled as a Random Geometric 
Graph RGG{N, r) |17|, where sensors are uniformly distributed 
over a unit square with communication links present only for 
pairwise distances of at most r. The noise is modeled as an 
exponentially correlated Gaussian covariance matrix S, 



(18) 



where a £ (0, 1) is indicative of the degree of spatial correlation. 
A smaller value of a indicates lower correlation with a ^ 
signifying completely independent observations. 

We consider = 50 nodes and the particular RGG used for 
our simulation is depicted in Figure [T] The individual sensor 
variances are generated by uniform random numbers in the 
range [0.5, 1.5] and the correlation parameter a — 0.02. The 
range-limit of observations is taken as ?7 = 20. 

In Figure [2] we compare the distortion performance MSE(0) 
( [T7| ) corresponding to the three scenarios when strategy ([T6| is 
applied to various transformations of the data. The line labeled 



not whitened corresponds to the naive case of strategy ([T6| being 
directly applied to the observation space x. Expectedly, the 
performance of this scheme is suboptimal. In spatially whitened 
cases, we use the transformed variable (see ([8|) 

X = W.rX, Wr = S-\Zrl)Zr,Dr = S-^{Zrl), (19) 

where Zr is the minimum-divergence solution (|7]i subject to 
constraints that [Zr]ij = if dij > r. We note that 

X ^ JV{19,WrT.W^), (20) 

which implies that Xk possess the same mean as the signal, but 
corrupted only with approximately independent Gaussian noise 
with vai-iance 7^ = Var(5fc) « [Dr]kM^ ([Zrl]fe)"^. Sti-ategy 
( fTSj l is then applied on whitened space x with replaced by 7^ 
in Equation ([T6|. We have shown the performance for r = 0.1 
and r = 0.5 in Figure |2] As the range r increases, we have more 
whitening and consequently the performance increases. Thirdly, 
we display the results for orthogonally whitened (or PCA) case, 
where we consider the well known eigenvalue decomposition 
S = UAU^ and consequently the whitening transformation 

x = D^U'^x, D = S{U'^1). (21) 

Since PCA fully whitens S (by definition), its performance is 
expected to provide a lower bound on that of other schemes. 
This is confirmed by Figure |2] However, since the weights 
in PCA are not designed to be zero for sensors that are far 
apart, such a transformation may be impossible to realize in 
a power constrained network and hence not realistic. Finally, 
the Cramer-Rao lower bound CRB = 1/(1^S^^1) is also 
displayed, which confirms that in the asymptotic regime with 
sufficient quantization bits per sensor, all these schemes perform 
identically. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Fig. 1. Random Geometric Grapli witli 50 nodes, used for example in Section 
|IV| Edges are sliown of pairwise distance less tlian 0.18. 

V. Conclusion 

In this paper, we have considered a two-stage framework for 
distributed signal processing in the presence of spatially corre- 
lated data. The first stage is designed to whiten the observation 
space by communicating only with neighboring sensors. In the 
second stage, each sensor encodes these whitened observations 
following well-known strategies derived using conditional inde- 
pendence assumption. We consider the example of bit-allocation 
for distributed estimation to demonstrate the potential applica- 
bility of this framework. Many research questions remain to be 
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Fig. 2. Distortion reduction achieved by spatial whitening. 

addressed. Some of them are efficient computation of the spatial 
whitening transformation, cost considerations for the whitening 
stage, extension of the framework to vector parameter scenarios 
and potential appUcabiUty in hypothesis testing problems. 
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